Trigger: Music Therapy is an evidence-based practice in psychology, which uses music as a method for the production of "happy feelings".
Explore: Whether there exists any correlations between the music taste / preferences of an individual and their mental health.
Do people with specific type of mental problems have specific music taste? Is the music taste healing them or worsening the problems?
For a specific type of mental health problem, which music genre should be considered for therapy?
Survey Structure:
Part 1: General background questions about the responders and their music habits.
Part 2: Frequency of listening to given music genres.
Part 3: Mental Health conditions
import pandas as pd
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import plotly.graph_objects as go
# 过滤提示语
import warnings
warnings.filterwarnings('ignore')
survey = pd.read_csv('mxmh_survey_results.csv')
survey.head(3) #前3行
# data shape
survey.shape
# check columns
survey.info()
# check missing values: True 1, False 0
survey.isnull().sum()
'''1. Age imputation (numeric) --> fill in with column mean'''
age_mean = round(survey['Age'].mean())
survey['Age'] = survey['Age'].fillna(age_mean)
'''2. Primary streaming service (categorical) --> fill with mode'''
service_mode = survey['Primary streaming service'].mode()[0] #index
survey['Primary streaming service'] = survey['Primary streaming service'].fillna(service_mode)
'''3. While working (categorical) --> fill with mode'''
work_mode = survey['While working'].mode()[0]
survey['While working'] = survey['While working'].fillna(work_mode)
'''4. Instrumentalist --> mode'''
inst_mode = survey['Instrumentalist'].mode()[0]
survey['Instrumentalist'] = survey['Instrumentalist'].fillna(inst_mode)
'''5. Composer --> mode'''
comp_mode = survey['Composer'].mode()[0]
survey['Composer'] = survey['Composer'].fillna(comp_mode)
'''6. Foreign languages --> mode'''
lang_mode = survey['Foreign languages'].mode()[0]
survey['Foreign languages'] = survey['Foreign languages'].fillna(lang_mode)
Problem:
"Music effects" is the major "Response" Variable, which can't be replaced by values from other people
Solution:
Define a NEW LABEL "Uncertain" for missing responses
# check unique values of a column
survey['Music effects'].unique()
'''7. Music effects --> new label'''
survey['Music effects'] = survey['Music effects'].fillna('Uncertain')
# survey.isnull().sum()
Problem:
BPM (beats per minute) is an optional question in the survey, so 107 out 736 missing. As the percentage is quite high, column mean can't be used here.
Solution:
Music of same genres would have similar BPM range, so we can calculate the average BPM for each genre, and fill in NA based on the given genre in that row.
# rows with BPM missing
bpm_na = survey[survey['BPM'].isna()]
# which genres of bpm to check?
bpm_na_genre = list(bpm_na['Fav genre'].unique())
# replace outlier with 0
'''
1. find and drop LARGE outliers first, then add back
Syntax: 先找index--> .index 找到对应行数
drop by index --> 根据行数drop整行内容
'''
# add back
bpm_out1 = survey[survey['BPM']>=500]
# drop LARGE outlier rows
bpm_index1 = survey[survey['BPM']>=500].index
survey = survey.drop(bpm_index1, axis=0)
'''
2. find and drop SMALL outliers first, then add back
'''
# add back
bpm_out2 = survey[survey['BPM']==0]
# drop 0 outlier rows
bpm_index2 = survey[survey['BPM']==0].index
survey = survey.drop(bpm_index2, axis=0)
'''Generate a dictionary to MAP genres and its BPM'''
bpm_dict = {}
for index, row in survey.iterrows():
genre = row['Fav genre']
bpm = row['BPM']
if pd.notna(bpm):
if genre not in bpm_dict:
bpm_dict[genre] = {'total_bpm': 0, 'count': 0}
bpm_dict[genre]['total_bpm'] += bpm
bpm_dict[genre]['count'] += 1
# 保留1位小数
mean_bpm = {genre: round(data['total_bpm']/data['count'], 1) for genre, data in bpm_dict.items()}
'''8. BPM --> fill in NA with genre mean'''
survey['BPM'] = survey['BPM'].fillna(survey['Fav genre'].map(mean_bpm))
'''先把outliers替换成正确的mean,再一起放回'''
bpm_out1.loc[bpm_out1['BPM']>500, 'BPM'] = bpm_out1['Fav genre'].map(mean_bpm)
bpm_out2.loc[bpm_out2['BPM']==0, 'BPM'] = bpm_out2['Fav genre'].map(mean_bpm)
'''ADD BACK: concatenate 合并tables'''
survey = pd.concat([survey, bpm_out1, bpm_out2], axis=0)
# reset index
survey = survey.reset_index(drop=True)
# descriptive statistics
survey['Age'].describe()
'''Check Age Distribution'''
fig = px.histogram(survey['Age'],
# number of bins
nbins=40,
title="Distribution of Music Therapy Responders Age")
# 修饰细节
fig.update_layout(
width=800, height=600,
# 加bins之间的缝隙
bargap=0.05,
# 修改title的横向x位置,纵向y位置
title_x=0.5,
title_y=0.9,
# 修改字号大小
font=dict(size=12),
# 修改横纵轴label
xaxis_title='Age', yaxis_title='Number of Survey Responders',
# set x ticks: tick values
xaxis=dict(tickvals=list(range(10, 90, 5))),
plot_bgcolor='white'
)
fig.show()
Age_index1 = survey[survey['Age']>75].index
survey = survey.drop(Age_index1, axis=0)
# reset index
survey = survey.reset_index(drop=True)
People in age group of 14-35 listen to music more frequently.
fig = px.histogram(survey['Hours per day'],
nbins=40,
title="Distribution of music listening hours of participants")
fig.update_layout(
bargap=0.05,
title_x=0.5,
title_y=0.9,
font=dict(size=12),
xaxis_title='Hours per day', yaxis_title='Number of Survey Responders',
xaxis=dict(tickvals=list(range(0, 100, 5))),
plot_bgcolor='black',
height=600, width=800
)
fig.show()
'''Remove outliers in Hours (clean >12)'''
hour_out = survey[survey['Hours per day']>12].index
survey = survey.drop(hour_out, axis=0)
# reset index
survey = survey.reset_index(drop=True)
Most people listen to music about 1-5 hours per day.
'''Syntax: 数据集['列的名字'].value_counts()会返回每一个独立值的总个数'''
service_count = survey['Primary streaming service'].value_counts()
service_count
'''Interactive Pie Chart'''
service_name=['Spotify', 'YouTube Music', 'CD/Tape', 'Apple Music', 'Other', 'Pandora']
fig3 = px.pie(service_count,
# 图里的数值信息:每种service的个数
values=service_count.values,
# 名称:种类名称
names=service_name, #service_count.index,
title='Responder Percentage for Music Streaming Services',
color_discrete_sequence=px.colors.sequential.Sunsetdark_r
)
fig3.update_traces(
# 同时显示比例和名称【默认:只有比例】
textinfo="percent+label",
# 把文字写进pie里
textposition='inside',
textfont_size=13,
# 显示模版(横向排列)
texttemplate="%{label} %{percent}"
)
fig3.update_layout(title_x=0.5, width=600,height=500)
fig3.show()
Spotify is clearly the most popular streaming service. (more than half population)
# 计算数量对应的比例:normalize
survey['While working'].value_counts(normalize=True)
'''seaborn countplot: used for comparing between groups'''
# set size
plt.figure(figsize=(8, 6))
sns.countplot(x=survey['While working'],
palette=['salmon', '#7F12AC'])
plt.title('Whether Listening to Music While Working')
plt.show()
79% of the responders listen to music while working.
survey['Instrumentalist'].value_counts()
survey['Composer'].value_counts()
Generate a new feature by combining existed columns, based on condition
# 只要符合任意一条,Instrumentalist或者Composer或者两者都是,就标记成 Pro
'''Syntax: np.where(“条件内容”,1号值,2号值),
符合给定条件,就标记成1号值;不符合,就标记成2号值'''
survey['Pro_temp'] = np.where((survey['Instrumentalist']=='Yes') |
(survey['Composer']=='Yes'),
# 符合上述条件,标记成Yes;不符合,标记成No
'Yes', 'No')
# 调整列的位置 - 用Pro占用原本Ins和Comp的位置
'''Insert a new column at a specific column
Syntax: 数据集.insert(指定位置, 插入列的名字, 对应列的数值)
'''
survey.insert(4, 'Pro', survey['Pro_temp'])
# 删除无用列(包括过渡列)
'''DROP redundant columns: Ins, Comp, Pro_temp, Permissions'''
survey = survey.drop(columns=['Instrumentalist', 'Composer', 'Pro_temp', 'Permissions'],
axis=1)
survey['Pro'].value_counts(normalize=True)
36.55% of the responders have professional music experience in playing instruments or composing.
genre_count = survey['Fav genre'].value_counts()
'''Interactive Pie Chart'''
fig4 = px.pie(survey,
# 图里的数值信息:每种service的个数
values=genre_count.values,
# 名称:种类名称
names=genre_count.index, #service_count.index,
title='Percentage of Responder Preferences for Music Genres',
color_discrete_sequence=px.colors.sequential.Turbo_r
)
fig4.update_traces(
# 同时显示比例和名称【默认:只有比例】
textinfo="percent+label",
# 把文字写进pie里
textposition='inside',
textfont_size=13,
# 显示模版(横向排列)
texttemplate="%{label} %{percent}"
)
fig4.update_layout(title_x=0.5, width=600, height=500)
fig4.show()
Top 3 favorite genres among responders are Rock (25.5%), Pop (15.7%) and Metal (12.1%), these 3 genres already occupies >50% population's preferences.
As we already know that responder in this survey are mostly 14-35 year-old young people, so different music genres might attract their own age groups.
'''For each music genre, check the age distribution'''
# 参考重要指标:median中位数,mean平均数
fig5 = px.box(x=survey['Fav genre'], y=survey['Age'],
# 调整横向或纵向
orientation='v', points='outliers',
# 添加凹陷处
notched=False,
title='Distribution of Ages in Each Music Genre'
)
# 显示平均值的线
fig5.update_traces(boxmean=False)
# 调整图像细节
fig5.update_layout(
width=800, height=600,
title_x=0.5, title_y=0.9,
xaxis_title='Favorite Music Genre',
yaxis_title='Age of Responders',
plot_bgcolor='white',
showlegend=False
)
'''结合上图(pie chart)里genre count的大小顺序,重新排序box plot【信息一致性】'''
fig5.update_xaxes(categoryorder='array', categoryarray=genre_count.index)
'''计算Age mean,用来添加折线'''
# 先按照genre分组,再计算各组的age mean,最后连线line chart
age_mean = dict(survey.groupby(survey['Fav genre'])['Age'].mean())
agemean_by_genre = {}
# 按照热门genre的顺序重新循环排序
for i in genre_count.index:
'''Syntax: 字典名[新的key]=新的值,表示在字典里创建新的【键-值对】'''
agemean_by_genre[i] = age_mean[i] # key: value
'''标记出各组genre平均数的变化线'''
fig5.add_trace(
# graph object
go.Scatter(x=list(agemean_by_genre.keys()),
y=list(agemean_by_genre.values()),
mode='lines+markers', name='Mean Age',
marker=dict(color='violet'))
)
fig5.show()
# 先按照genre分组,再计算各组的age mean,最后连线line chart
age_mean = dict(survey.groupby(survey['Fav genre'])['Age'].mean())
# 按照热门genre的顺序排列
agemean_by_genre = {}
# 按照热门genre的顺序重新循环排序
for i in genre_count.index:
'''Syntax: 字典名[新的key]=新的值,表示在字典里创建新的【键-值对】'''
agemean_by_genre[i] = age_mean[i] # key: value
'''Distribution of Listening Hours by Favorite Genre'''
fig6 = px.box(survey, x='Fav genre', y='Hours per day',
title='Distribution of Listening Hours by Favorite Genre',
labels={'Fav genre': 'Favorite Genre', 'Hours per day': 'Hours per Day'}
)
fig6.update_traces(boxmean=False)
fig6.update_layout(
width=800, height=600,
title_x=0.5, title_y=0.9,
plot_bgcolor='white',
showlegend=False
)
fig6.update_xaxes(categoryorder='array', categoryarray=genre_count.index)
LH_mean = dict(survey.groupby(survey['Fav genre'])['Hours per day'].mean())
LHmean_by_genre = {}
for i in genre_count.index:
LHmean_by_genre[i] = LH_mean[i]
fig6.add_trace(
go.Scatter(x=list(LHmean_by_genre.keys()),
y=list(LHmean_by_genre.values()),
mode='lines+markers', name='Mean Hours',
marker=dict(color='violet')
))
fig6.show()
'''SUBPLOT'''
fig7 = plt.figure(figsize=(10, 6))
# super title
plt.suptitle('\nListening Habits', fontsize=14)
# 设置小图的位置:1行2列里的第1个
ax = fig7.add_subplot(1, 2, 1)
explore = survey['Exploratory'].value_counts()
explore.plot(kind='pie', colors=['tomato', 'dodgerblue'])
ax = fig7.add_subplot(1, 2, 2)
foreign = survey['Foreign languages'].value_counts()
foreign.plot(kind='pie', colors=['tomato', 'dodgerblue'])
fig7.show()
'''根据内容替换,Syntax: 数据集.replace([被换掉的内容'Never'], [新的内容 0])'''
survey = survey.replace(['Never', 'Rarely', 'Sometimes', 'Very frequently'],
[0, 1, 2, 3])
'''Calculate number of people in each frequency group by genre'''
gen_freq_dict = {}
for i in range(9, 25):
# 洗名称:去掉 Frequency []重复的结构
genre_name = survey.iloc[:, i].name[11:-1]
# 按照频率组0123排序
genre_freq = survey.iloc[:,i].value_counts().sort_index()
# 只收集数字,不需要Name/dtype的结构
gen_freq_dict[genre_name] = genre_freq.values.tolist()
'''Create a DataFrame based on a Dictionary'''
gen_freq_df = pd.DataFrame(gen_freq_dict, columns=gen_freq_dict.keys())
# clarify our own freq LABEL: 0123 (not auto index)
gen_freq_df.index = [0, 1, 2, 3]
'''Transpose data, to make genre a new variable【行变列,列变行】'''
# 重新设置index
gen_freq_df = gen_freq_df.T.reset_index()
# 修改column names
gen_freq_df.columns = ['genre', 'Never', 'Rarely', 'Sometimes','VeryFrequently']
gen_freq_df
fig10 = px.bar(gen_freq_df,
x='genre',
y=['Never', 'Rarely', 'Sometimes', 'VeryFrequently'],
barmode='group', # 调整bar的摆放模式(并立group,或堆叠stack)
title='Listening Frequency Distribution for Each Genre',
labels={'variable': 'Frequency'},
# color_discrete_sequence=px.colors.sequential.Reds,
color_discrete_sequence=['mistyrose', 'coral', 'tomato', 'firebrick']
)
fig10.update_layout(
width=800, height=600,
title_x=0.5, title_y=0.9,
# xlabel
xaxis_title='Music Genre', yaxis_title='Responders in Each Frequency Group',
plot_bgcolor='dimgrey'
)
fig10.show()
Genres with most responders who "Never" listen to them: Gospel, Latin, K Pop
Genres with fans who listen Very Frequently: Rock and Pop
mental_problem = survey[['Anxiety', 'Depression', 'Insomnia', 'OCD']]
mental_problem.sample(3)
fig11 = px.box(mental_problem,
# 横向显示:horizontal
orientation='h',
# 在中位数的位置添加凹槽
notched=True,
title='Intensity of Mental Problems for Responders'
)
# 添加平均数虚线
fig11.update_traces(boxmean=True)
fig11.update_layout(
width=800, height=600,
title_x=0.5, title_y=0.9,
# xlabel
xaxis_title='Mental Problem Intensity Score', yaxis_title='Mental Health Problems',
# plot_bgcolor='dimgrey'
)
fig11.show()
Define the Levels of mental problems: Level 1-3 (Mild), Level 4-7 (Moderate), Level 8-10 (Severe)</u>
# 先选一个精神问题类别
mh_genre = survey[['Anxiety', 'Depression', 'Insomnia', 'OCD', 'Fav genre']]
# 初始化组合图
fig11 = go.Figure()
# Anxiety
fig11.add_trace(
go.Box(
x=mh_genre['Fav genre'],
y=mh_genre['Anxiety'],
marker_color='orangered',
name='Anxiety')
)
# Depression
fig11.add_trace(
go.Box(
x=mh_genre['Fav genre'],
y=mh_genre['Depression'],
marker_color='royalblue',
name='Depression')
)
# Insomnia
fig11.add_trace(
go.Box(
x=mh_genre['Fav genre'],
y=mh_genre['Insomnia'],
marker_color='dodgerblue',
name='Insomnia')
)
# OCD
fig11.add_trace(
go.Box(
x=mh_genre['Fav genre'],
y=mh_genre['OCD'],
marker_color='blueviolet',
name='OCD')
)
fig11.update_layout(
# 设置图像的宽和高
width=1000, height=500,
# 调整并排显示
boxmode='group',
xaxis_title='Favorite Genre',
yaxis_title='Mental Problem Intensity Score',
title_text='Relationship between Favorite Genre & Mental Problem Intensity',
title_x=0.5, title_y=0.95,
# 调整legend的方向和位置
legend=dict(title='Mental Problems', xanchor='left', x=0.2, yanchor='top', y=1.12, orientation='h')
)
fig11.show()
Anxiety
Responders who favor Folk have obviously higher intensity of anxiety, with almost all above Level 3 (except one outlier at 0), 75% above Level 6 and median at Level 7.5</u>
Pop listeners have comparatively higher anxiety level, with 75% above Level 5
Hip hop (all above Level 2, median at Level 7) and Lofi (with all above Level 3, median at Level 7) listeners have higher anxiety level.
Responders who listen to Gospel are less likely to have anxiety, where 75% are below Level 6 (moderate)
Depression
Responders who listen to lofi show the greatest propensity to suffer from depression, with 25% above level 8 and 75% above level 5.</u>
Responders who listen to latin and gospel exhibit significantly less level of depression, with 50% of those who listen to gospel have depression level below 1 and 50% of those who listen to latin have a depression level of 3. </u>
Insomnia
Responders who listen to lofi also show a high level of insomnia, with a median at level 6 and 25% above level 8. </u>
Video game music(median at level 3.5, 25% above level 7), and metal(median at level 5 and 25% above level 7)
Responders who listen to rap generally have lower intensity of insomnia, with 100% below Level 7 and 75% below Level 4</u>
OCD
Responders who listen to rap show a relatively high level of OCD, with a median at level 3 and 75% above level 1. </u>
Responders who listen to folk generally have lower intensity of OCD, with 90%+ below Level 5 and median at Level 1.5.</u>
Overall, responders who listen to lofi have high levels in all four mental problems, especially in depression and insomnia.
Generally, responders who listen to R&B and Rap have relatively lower levels in alll four mental problems, with all medians below Level 5.</u>
# 第一组:Level 8-10的平均听歌时间
anxiety_high = survey[survey['Anxiety']>=8]['Hours per day'].mean()
depression_high = survey[survey['Depression']>=8]['Hours per day'].mean()
insomnia_high = survey[survey['Insomnia']>=8]['Hours per day'].mean()
ocd_high = survey[survey['OCD']>=8]['Hours per day'].mean()
# 第二组:Level 1-3的平均听歌时间
anxiety_low = survey[survey['Anxiety']<=3]['Hours per day'].mean()
depression_low = survey[survey['Depression']<=3]['Hours per day'].mean()
insomnia_low = survey[survey['Insomnia']<=3]['Hours per day'].mean()
ocd_low = survey[survey['OCD']<=3]['Hours per day'].mean()
'''Create a DataFrame'''
print('\nRelationship between Listening Hours and Mental Problem Intensity\n')
hour_by_mh = pd.DataFrame({'High Intensity': [anxiety_high, depression_high, insomnia_high, ocd_high],
'Low Intensity': [anxiety_low, depression_low, insomnia_low, ocd_low]},
index= mental_problem.columns.tolist())
hour_by_mh
# 初始化组合图
fig12 = go.Figure()
# Anxiety
fig12.add_trace(
go.Box(
x=survey['Pro'],
y=survey['Anxiety'],
marker_color='orangered',
name='Anxiety')
)
# Depression
fig12.add_trace(
go.Box(
x=survey['Pro'],
y=survey['Depression'],
marker_color='royalblue',
name='Depression')
)
# Insomnia
fig12.add_trace(
go.Box(
x=survey['Pro'],
y=survey['Insomnia'],
marker_color='dodgerblue',
name='Insomnia')
)
# OCD
fig12.add_trace(
go.Box(
x=survey['Pro'],
y=survey['OCD'],
marker_color='blueviolet',
name='OCD')
)
fig12.update_layout(
# 设置图像的宽和高
width=800, height=400,
# 调整并排显示
boxmode='group',
xaxis_title='Whether With Professional Music Background',
yaxis_title='Mental Problem Intensity Score',
title_text='Relationship between Music Background & Mental Problem Intensity',
title_x=0.5, title_y=0.95,
# 调整legend的方向和位置
legend=dict(title='Mental Problems', xanchor='left', x=0.15, yanchor='top', y=1.15, orientation='h')
)
fig12.show()
Responders WITH professional musical background have higher median anxiety intensity
Responders WITHOUT professional musical background have more concentrated lower OCD intensity.
# create a correlation matrix
corr_mh = round(mental_problem.corr(), 2)
fig13 = px.imshow(corr_mh, text_auto=True, color_continuous_scale='Reds')
fig13.update_layout(
width=500, height=500,
title_text='Correlation among Mental Problems',
title_x=0.5
)
fig13.show()
# Pie Chart for Music Effects
effect = survey['Music effects'].value_counts(normalize=True)
fig14=px.pie(
effect,
values=effect.values,
names=effect.index, #索引(行数)
title='Percentage of Music Effects',
color_discrete_sequence=['dodgerblue', 'pink', 'red', 'yellow']
)
fig14.update_traces(
# 同时显示比例和名称【默认:只有比例】
textinfo="percent+label",
# 把文字写进pie里
textposition='inside',
textfont_size=13,
# 显示模版(横向排列)
texttemplate="%{label} %{percent}"
)
fig14.update_layout(width=600,height=500, title_x=0.5)
fig14.show()
# divide into groups based on mental health problem
anxiety_group = survey[survey['Anxiety']>3]
depression_group= survey[survey['Depression']>3]
insomnia_group = survey[survey['Insomnia']>3]
ocd_group = survey[survey['OCD']>3]
'''Calculate number of people in each Effect group for each Genre'''
anxiety_effect_list = []
for g in survey['Fav genre'].unique().tolist():
genre_effect = dict(anxiety_group[anxiety_group['Fav genre']==g].groupby('Music effects').size())
# print(g, genre_effect)
anxiety_effect_list.append([g, genre_effect])
anxiety_effect_df = pd.DataFrame(anxiety_effect_list, columns=['Genre', 'Therapy Effect'])
'''Expand column of a dictionary into 4 columns'''
anxiety_effect_df1= anxiety_effect_df['Therapy Effect'].apply(pd.Series)
# fill in NA with 0; modify data type into integer (num of people)
anxiety_effect_df1 = anxiety_effect_df1.fillna(0).astype(int)
# merge two dataframes; DROP 'Therapy Effect'
anxiety_effect_df2 = anxiety_effect_df.join(anxiety_effect_df1).drop('Therapy Effect', axis=1)
# calculate total number of people for each Genre
anxiety_effect_df2.insert(1, 'Sum', anxiety_effect_df2.iloc[:, 1:].sum(axis=1))
'''Feature Engineering: calculate percentage of each effect'''
anxiety_effect_df2['Improve_pct'] = round(anxiety_effect_df2['Improve']/anxiety_effect_df2['Sum']*100, 1)
anxiety_effect_df2['NoEffect_pct'] = round(anxiety_effect_df2['No effect']/anxiety_effect_df2['Sum']*100, 1)
anxiety_effect_df2['Uncertain_pct'] = round(anxiety_effect_df2['Uncertain']/anxiety_effect_df2['Sum']*100, 1)
anxiety_effect_df2['Worsen_pct'] = round(anxiety_effect_df2['Worsen']/anxiety_effect_df2['Sum']*100, 1)
anxiety_effect_df2
fig15 = px.bar(
anxiety_effect_df2,
x='Genre', y=['Improve', 'No effect', 'Uncertain', 'Worsen'],
title='Therapy Effect on Anxiety by Genre',
labels={'variable':'Therapy Effect'},
color_discrete_sequence=['dodgerblue','pink','lime','red'],
# 增加浮窗数据内容
hover_data=['Worsen_pct', 'NoEffect_pct', 'Improve_pct']
)
fig15.update_layout(
width=800, height=500,
title_x=0.5, title_y=0.9,
xaxis_title='Music Genre',
yaxis_title='Responders in Each Effect Group',
legend=dict(xanchor='right', yanchor='top', x=0.99, y=0.99),
plot_bgcolor='white'
)
fig15.show()
Music genres that may worsen anxiety during therapy: Video game music (10.8%), Pop (4.3%), Rock (2.8%)
Music genres that should be recommended to their fans when healing anxiety: Lofi (100% Improve).
depression_effect_list= []
for g in survey['Fav genre'].unique().tolist():
genre_effect = dict(depression_group[depression_group['Fav genre']==g].groupby('Music effects').size())
depression_effect_list.append([g, genre_effect])
depression_effect_df = pd.DataFrame(depression_effect_list, columns=['Genre', 'Therapy Effect'])
depression_effect_df1= depression_effect_df['Therapy Effect'].apply(pd.Series)
depression_effect_df1 = depression_effect_df1.fillna(0).astype(int)
depression_effect_df2= depression_effect_df.join(depression_effect_df1).drop('Therapy Effect', axis=1)
# calculate total number of people for each Genre
depression_effect_df2.insert(1, 'Sum', depression_effect_df2.iloc[:, 1:].sum(axis=1))
'''Feature Engineering: calculate percentage of each effect'''
depression_effect_df2['Improve_pct'] = round(depression_effect_df2['Improve']/depression_effect_df2['Sum']*100, 1)
depression_effect_df2['NoEffect_pct'] = round(depression_effect_df2['No effect']/depression_effect_df2['Sum']*100, 1)
depression_effect_df2['Uncertain_pct'] = round(depression_effect_df2['Uncertain']/depression_effect_df2['Sum']*100, 1)
depression_effect_df2['Worsen_pct'] = round(depression_effect_df2['Worsen']/depression_effect_df2['Sum']*100, 1)
fig16 = px.bar(
depression_effect_df2,
x='Genre', y=['Improve', 'No effect', 'Uncertain', 'Worsen'],
title='Therapy Effect on Depression by Genre',
labels={'variable':'Therapy Effect'},
color_discrete_sequence=['dodgerblue','pink','lime','red'],
# 增加浮窗数据内容
hover_data=['Worsen_pct', 'NoEffect_pct', 'Improve_pct']
)
fig16.update_layout(
width=800, height=500,
title_x=0.5, title_y=0.9,
xaxis_title='Music Genre',
yaxis_title='Responders in Each Effect Group',
legend=dict(xanchor='right', yanchor='top', x=0.99, y=0.99),
plot_bgcolor='white'
)
fig16.show()
Music genres that may worsen depression during therapy: Video game music (15.4%), Rock (5.7%), Pop (4.2%)
Music genres that should be recommended to their fans when healing depression: Lofi & Jazz (100% Improve).
insomnia_effect_list= []
for g in survey['Fav genre'].unique().tolist():
genre_effect = dict(insomnia_group[insomnia_group['Fav genre']==g].groupby('Music effects').size())
insomnia_effect_list.append([g, genre_effect])
insomnia_effect_df = pd.DataFrame(insomnia_effect_list, columns=['Genre', 'Therapy Effect'])
insomnia_effect_df1= insomnia_effect_df['Therapy Effect'].apply(pd.Series)
insomnia_effect_df1 = insomnia_effect_df1.fillna(0).astype(int)
insomnia_effect_df2= insomnia_effect_df.join(insomnia_effect_df1).drop('Therapy Effect', axis=1)
# calculate total number of people for each Genre
insomnia_effect_df2.insert(1, 'Sum', insomnia_effect_df2.iloc[:, 1:].sum(axis=1))
'''Feature Engineering: calculate percentage of each effect'''
insomnia_effect_df2['Improve_pct'] = round(insomnia_effect_df2['Improve']/insomnia_effect_df2['Sum']*100, 1)
insomnia_effect_df2['NoEffect_pct'] = round(insomnia_effect_df2['No effect']/insomnia_effect_df2['Sum']*100, 1)
insomnia_effect_df2['Uncertain_pct'] = round(insomnia_effect_df2['Uncertain']/insomnia_effect_df2['Sum']*100, 1)
insomnia_effect_df2['Worsen_pct'] = round(insomnia_effect_df2['Worsen']/insomnia_effect_df2['Sum']*100, 1)
fig17 = px.bar(
insomnia_effect_df2,
x='Genre', y=['Improve', 'No effect', 'Uncertain', 'Worsen'],
title='Therapy Effect on Insomnia by Genre',
labels={'variable':'Therapy Effect'},
color_discrete_sequence=['dodgerblue','pink','lime','red'],
# 增加浮窗数据内容
hover_data=['Worsen_pct', 'NoEffect_pct', 'Improve_pct']
)
fig17.update_layout(
width=800, height=500,
title_x=0.5, title_y=0.9,
xaxis_title='Music Genre',
yaxis_title='Responders in Each Effect Group',
legend=dict(xanchor='right', yanchor='top', x=0.99, y=0.99),
plot_bgcolor='white'
)
fig17.show()
Music genres that should be recommended to their fans when healing depression: Lofi (100% Improve).
ocd_effect_list= []
for g in survey['Fav genre'].unique().tolist():
genre_effect = dict(ocd_group[ocd_group['Fav genre']==g].groupby('Music effects').size())
ocd_effect_list.append([g, genre_effect])
ocd_effect_df = pd.DataFrame(ocd_effect_list, columns=['Genre', 'Therapy Effect'])
ocd_effect_df1= ocd_effect_df['Therapy Effect'].apply(pd.Series)
ocd_effect_df1 = ocd_effect_df1.fillna(0).astype(int)
ocd_effect_df2= ocd_effect_df.join(ocd_effect_df1).drop('Therapy Effect', axis=1)
# calculate total number of people for each Genre
ocd_effect_df2.insert(1, 'Sum', ocd_effect_df2.iloc[:, 1:].sum(axis=1))
ocd_effect_df2['Improve_pct'] = round(ocd_effect_df2['Improve']/ocd_effect_df2['Sum']*100, 1)
ocd_effect_df2['NoEffect_pct'] = round(ocd_effect_df2['No effect']/ocd_effect_df2['Sum']*100, 1)
ocd_effect_df2['Uncertain_pct'] = round(ocd_effect_df2['Uncertain']/ocd_effect_df2['Sum']*100, 1)
ocd_effect_df2['Worsen_pct'] = round(ocd_effect_df2['Worsen']/ocd_effect_df2['Sum']*100, 1)
fig18 = px.bar(
ocd_effect_df2,
x='Genre', y=['Improve', 'No effect', 'Uncertain', 'Worsen'],
title='Therapy Effect on OCD by Genre',
labels={'variable':'Therapy Effect'},
color_discrete_sequence=['dodgerblue','pink','lime','red'],
# 增加浮窗数据内容
hover_data = ['Worsen_pct', 'NoEffect_pct', 'Improve_pct']
)
fig18.update_layout(
width=800, height=500,
title_x=0.5, title_y=0.9,
xaxis_title='Music Genre',
yaxis_title='Responders in Each Effect Group',
legend=dict(xanchor='right', yanchor='top', x=0.99, y=0.99),
plot_bgcolor='white'
)
fig18.show()
Music genres that may worsen OCD during therapy: Pop (5%), Rock (3.4%)
Music genres that should be recommended to their fans when healing depression: Lofi, R&B, Hip hop (100% Improve).
Missing Value Imputation
Remove Outliers
提示使用者安装特定可视化包: !pip install Plotly
Version: 5.18.0
Name: pandas
Version: 1.3.5
Name: numpy
Version: 1.21.6
Name: matplotlib
Version: 3.1.3